On Deep Representation Learning from Noisy Web Images
نویسندگان
چکیده
The keep-growing content of Web images may be the next important data source to scale up deep neural networks, which recently obtained a great success in the ImageNet classification challenge and related tasks. This prospect, however, has not been validated on convolutional networks (convnet) – one of best performing deep models – because of their supervised regime. While unsupervised alternatives are not so good as convnet in generalizing the learned model to new domains, we use convnet to leverage semisupervised representation learning. Our approach is to use massive amounts of unlabeled and noisy Web images to train convnets as general feature detectors despite challenges coming from data such as high level of mislabeled data, outliers, and data biases. Extensive experiments were conducted at several data scales, different network architectures, and data reranking techniques. The learned representations are evaluated on nine public datasets of various topics. The best results obtained by our convnets, trained on 3.14 million Web images, outperforms AlexNet [27] trained on 1.2 million clean images of ILSVRC 2012 and is closing the gap with VGG-16 [40]. These prominent results suggest a budget solution to use deep learning in practice and motivate more research in semi-supervised representation learning.
منابع مشابه
A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels
The recent success of deep neural networks is powered in part by large-scale well-labeled training data. However, it is a daunting task to laboriously annotate an ImageNet-like dateset. On the contrary, it is fairly convenient, fast, and cheap to collect training images from the Web along with their noisy labels. This signifies the need of alternative approaches to training deep neural networks...
متن کاملDeep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملWebVision Database: Visual Learning and Understanding from Web Data
In this paper, we present a study on learning visual recognition models from large scale noisy web data. We build a new database called WebVision, which contains more than 2.4million web images crawled from the Internet by using queries generated from the 1, 000 semantic concepts of the ILSVRC 2012 benchmark. Meta information along with those web images (e.g., title, description, tags, etc.) ar...
متن کاملSimilarity measurement for describe user images in social media
Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...
متن کاملPrajna: Towards Recognizing Whatever You Want from Images without Image Labeling
With the advances in distributed computation, machine learning and deep neural networks, we enter into an era that it is possible to build a real world image recognition system. There are three essential components to build a real-world image recognition system: 1) creating representative features, 2) designing powerful learning approaches, and 3) identifying massive training data. While extens...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1512.04785 شماره
صفحات -
تاریخ انتشار 2015